Search CORE

49 research outputs found

Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields

Author: Albertson
Albertson
Beitzinger
Brown
Chin
E. M. Airoldi
Han
Heim
Huang
Jonsson
Nag
O. G. Troyanskaya
R. E. Schapire
Rocke
Rueda
Shah
Snijders
V. Dumeaux
van Beers
Wessels
Yi
Z. Barutcuoglu
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Motivation: The heterogeneity of cancer cannot always be recognized by tumor morphology, but may be reflected by the underlying genetic aberrations. Array comparative genome hybridization (array-CGH) methods provide high-throughput data on genetic copy numbers, but determining the clinically relevant copy number changes remains a challenge. Conventional classification methods for linking recurrent alterations to clinical outcome ignore sequential correlations in selecting relevant features. Conversely, existing sequence classification methods can only model overall copy number instability, without regard to any particular position in the genome

Crossref

Harvard University - DASH

PubMed Central

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Gene Function Classification Using Bayesian Models with Hierarchy-Based Priors

Author: A Clare
A McCallum
AS Weigend
B Rost
B Schoikowski
B Shahbaba
Babak Shahbaba
BE Engelhardt
D Koller
EM Marcotte
FR Blattner
H Blockeel
I Tsochantaridis
IUBMB
J DeRisi
J Fox
J Goodman
J Struyf
J Zhang
JA Eisen
JR Guest
K Sjölander
L Cai
L Dehaspe
M Brown
M Deng
M Deng
M Eisen
M Riley
M Riley
N Cesa-Bianchi
O Dekel
P Pavlidis
R Caruana
R Eisner
Radford M Neal
RD King
RD King
RM Neal
RM Neal
RM Neal
S Rison
S Sattath
S Spiro
SF Altschul
ST Dumais
WR Pearson
Z Barutcuoglu
Publication venue
Publication date: 01/01/2006
Field of study

We investigate the application of hierarchical classification schemes to the annotation of gene function based on several characteristics of protein sequences including phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and a MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome. The results from all three models show substantial improvement over previous methods, which were based on the C5 algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining these sources of information, our approach results in a higher accuracy rate when compared to models that use each data source alone. Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information

arXiv.org e-Print Archive

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Multi-Target Prediction: A Unifying View on Problems and Methods

Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

GOTA: GO term annotation of biomedical literature

Author: A Doms
A Schlicker
A Singhal
C Blaschke
D Li
DL Rubin
G Salton
Giacomo Domeniconi
Gianluca Moro
J Gobeill
J Gobeill
J Lomax
J Rousu
K Verspoor
L Du Plessis
L Hirschman
Luciano Margara
M Ashburner
MF Porter
N Cesa-Bianchi
N Skunca
NR Silla
NS Altman
P Radivojac
Pietro Di Lena
SE Lewis
T Liu
TH Wonnacott
Y Mao
Y Tao
Z Barutcuoglu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Interspecies gene function prediction using semantic similarity

Author: A Benso
A Holzinger
A Mitrofanova
A Schlicker
BM Good
C Pesquita
C Pesquita
CL Myers
D Lee
FM Couto
G Valentini
G Yu
G Yu
G Yu
G Yu
G Yu
GO Consortium
Guangyuan Fu
Guoxian Yu
H Yang
J Demsar
J Wu
JL Sevilla
Jun Wang
JZ Wang
L Wilcoxon
M Ashburner
M Cao
M Mistry
ML Zhang
MS Alexandra
MW Hahn
N Cesa-Bianchi
OD King
P Legrain
P Radivojac
PD Thomas
PH Guzzi
PW Lord
Q Zou
Q Zou
R Rada
R Sharan
RJ Roberts
S Mostafavi
SY Rhee
Wei Luo
X Zeng
Y Loewenstein
Y Tao
Z Barutcuoglu
Z Teng
Z Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Incorporating functional inter-relationships into protein function prediction algorithms

Author: A Herscovics
A Mateos
A Ruepp
AJ Parodi
ASN Seshasayee
B Shahbaba
C Stark
C Wang
Chad L Myers
CL Myers
D Lin
E Nabieva
F Azuaje
F Reggiori
G Pandey
G Pandey
G Tsoumakas
Gaurav Pandey
H Yu
J Geng
J Helenius
JE Shea
JJ Jiang
JL Sevilla
K Tarassov
M Ashburner
M Kuramochi
M Schuldiner
MP Brown
NJ Krogan
P D'haeseleer
P Resnik
PN Lipke
PN Tan
PW Lord
S Carroll
S Mnaimneh
S Siegel
S Vincenti
SW Stevens
T Gabaldon
T Xu
TR Hughes
Vipin Kumar
Y Tao
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 07/01/2008
Field of study

Abstract Background Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein function prediction. While successful function prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches. Results We propose a method to enhance the performance of classification-based protein function prediction algorithms by addressing the issue of using these interrelationships between functional classes constituting functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the <it>k</it>-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate predictions for a large number of the functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of functional inter-relationships enables the discovery of interesting biology in the form of novel functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1. Conclusion We implemented and evaluated a methodology for incorporating interrelationships between functional classes into a standard classification-based protein function prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at <url>http://www.cs.umn.edu/vk/gaurav/functionalsimilarity/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Minnesota Digital Conservancy

Bayesian aggregation versus majority vote in the characterization of non-specific arm pain based on quantitative needle electromyography

Author: A Hamilton-Wright
Andrew Hamilton-Wright
B Larsson
B Larsson
B Larsson
C Nadeau
Daniel W Stashuk
DV Budescu
DW Stashuk
DW Stashuk
DW Stashuk
DW Stashuk
DW Stashuk
E Stålberg
G Hagg
G Pfeiffer
G Pfeiffer
GJ Macfarlane
I Kononenko
J Greening
J Greening
J Greening
J Greening
J Greening
J Lipscomb
JM Harrington
K Calder
K Calder
K Walker-Bone
Kristina M Calder
Linda McLean
M Urwin
M West
Q McNemar
R Kohavi
RO Duda
RT Clemen
S Podner
S Podner
SE Larsson
SE Larsson
SE Larsson
SE Larsson
T Rosqvist
VL Durkalski
WF Brown
X Dennett
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data

Author: A Kuzniar
A Vazquez
Aalt D. J. van Dijk
AJ Enright
C Moler
Cajo J. F. ter Braak
CJF Ter Braak
CJF Ter Braak
CM Federovitch
DJC MacKay
GD Bader
GR Lanckriet
H Lee
I Kosmidis
I Ulitsky
Iddo Friedberg
IM Cheeseman
J Besag
JA Hanley
L Milligan
L Peña Castillo
M Ashburner
M Deng
M Deng
M Punta
Marco C. A. M. Bink
N Nariai
NJ Mulder
P McCullagh
R Sharan
RI Kondor
Roeland C. H. J. van Ham
S Ferré
S Geman
S Letovsky
S Mostafavi
SF Altschul
SR Collins
SZ Li
T Gabaldon
U Karaoz
V Vethantham
XL Chen
Y Chen
Y Guan
Yiannis A. I. Kourmpetis
Z Barutcuoglu
Z Wei
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S.cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Building multiclass classifiers for remote homology detection and fold recognition

Author: A Heger
A Krogh
A Sun
AG Murzin
B Taskar
C Leslie
C Leslie
CA Orengo
CD Huang
CH Ding
D Mittelman
E le
E Lindahl
EL Allwein
F Aiolli
F Rosenblatt
George Karypis
H Rangwala
H Saigo
Huzefa Rangwala
I Tsochantaridis
J Rousu
J Shi
J Weston
K Crammer
K Crammer
L Holm
L Liao
M Collins
M Collins
M Marti-Renom
P Baldi
R Kuang
R Rifkin
S Altschul
SB Needleman
SE Brenner
T Jaakkola
T Jaakkola
T Joachims
TF Smith
TG Dietterich
V Vapnik
W Pearson
Y Guermeur
Y Guermeur
Y Hou
Y Hou
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems. RESULTS: We present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes. CONCLUSION: Analyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Minnesota Digital Conservancy

Directing Experimental Biology: A Case Study in Mitochondrial Biogenesis

Author: A Goffeau
A Jaimovich
A Sickmann
AB Owen
AH Tong
Amy A. Caudy
Andrey Rzhetsky
AV Kochetov
BJ Blencowe
Burke
C Andreoli
C Huttenhower
Chad L. Myers
CL Myers
CL Myers
CL Myers
Curtis Huttenhower
David C. Hess
DC Hess
E Nabieva
F Foury
F Perocchi
G Giaever
GR Lanckriet
H Kitano
H Koutnikova
H Prokisch
I Boldogh
I Lee
IR Boldogh
JB Moseley
JM Cherry
Kai Li
L Peña-Castillo
LM Steinmetz
M Ashburner
M Babcock
M Grunstein
M Ogur
M Ogur
MA Hibbs
Matthew A. Hibbs
OG Troyanskaya
Olga G. Troyanskaya
P Pavlidis
R Jansen
S DiMauro
TR Hughes
Z Barutcuoglu
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Computational approaches have promised to organize collections of functional genomics data into testable predictions of gene and protein involvement in biological processes and pathways. However, few such predictions have been experimentally validated on a large scale, leaving many bioinformatic methods unproven and underutilized in the biology community. Further, it remains unclear what biological concerns should be taken into account when using computational methods to drive real-world experimental efforts. To investigate these concerns and to establish the utility of computational predictions of gene function, we experimentally tested hundreds of predictions generated from an ensemble of three complementary methods for the process of mitochondrial organization and biogenesis in Saccharomyces cerevisiae. The biological data with respect to the mitochondria are presented in a companion manuscript published in PLoS Genetics (doi:10.1371/journal.pgen.1000407). Here we analyze and explore the results of this study that are broadly applicable for computationalists applying gene function prediction techniques, including a new experimental comparison with 48 genes representing the genomic background. Our study leads to several conclusions that are important to consider when driving laboratory investigations using computational prediction approaches. While most genes in yeast are already known to participate in at least one biological process, we confirm that genes with known functions can still be strong candidates for annotation of additional gene functions. We find that different analysis techniques and different underlying data can both greatly affect the types of functional predictions produced by computational methods. This diversity allows an ensemble of techniques to substantially broaden the biological scope and breadth of predictions. We also find that performing prediction and validation steps iteratively allows us to more completely characterize a biological area of interest. While this study focused on a specific functional area in yeast, many of these observations may be useful in the contexts of other processes and organisms

Public Library of Science (PLOS)

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary

Directory of Open Access Journals

PubMed Central